330
20.8
When Does the Computer Stop Calculating?
Question 8.1
Here some algorithms are compared in terms of their computation time, it results:
(a) RNAfold with small RNA and large RNA (quadratic increase with sequence)
(b) BLAST search (grows linearly with search sequence and database)
Short peptide example, long protein example. Search in the NRDB database, and only in
the human sequences (use species option).
The E-value moves favorably downward, toward smaller values, for a smaller database.
Why? Well, the larger the database, the higher the probability of random hits. So the
expected value (E-value) for a random, non-biological, non-relevant hit gets higher. So the
better I can narrow down where I expect my hit to be (e.g. a species-specific database), the
more significant and meaningful my result will be.
(a) Protein folding
This is an NP-hard problem, which means that the computation time increases many
times with each additional amino acid. It is thus not at all clear how long the computer will
take (non-polynomial complex problem), but at least if you get a solution you can deter
mine in polynomial time how good it is. Nevertheless, protein structures can be predicted
for many practical purposes, e.g. by comparison with known structures, e.g. with SWISS-
MODEL (but already here the answer comes only by e-mail, it takes time), or somewhat
more precisely, but more computationally expensive, with MODELLER or actually “ab
initio”, i.e. from the sequence, by folding, calculated by the Zhang lab (with QUARK etc.).
Question 8.2
A nice answer is given by this Youtube video, but unfortunately it is in English:
https://www.youtube.com/watch?v=SC5CX8drAtU.
Here are compared:
Greedy strategy: locally optimal choice at each stage; at each stage visit an unvisited
city nearest to the current city. This heuristic need not find a best solution, but terminates
in a reasonable number of steps; finding an optimal solution typically requires unreason
ably many steps. In mathematical optimization, greedy algorithms solve combinatorial
problems having the properties of matroids (a structure that captures and generalizes the
notion of linear independence in vector spaces).
Local search strategy: Local search is a generic term for a number of metaheuristic
search methods in combinatorial optimization. The methods are used in many variations
to solve complicated optimization problems approximately (e.g., the traveling salesman
problem). The basic principle is to find a better solution starting from a given initial
20 Solutions to the Exercises